Rich Source-Side Context for Statistical Machine Translation
نویسندگان
چکیده
We explore the augmentation of statistical machine translation models with features of the context of each phrase to be translated. This work extends several existing threads of research in statistical MT, including the use of context in example-based machine translation (Carl and Way, 2003) and the incorporation of word sense disambiguation into a translation model (Chan et al., 2007). The context features we consider use surrounding words and part-of-speech tags, local syntactic structure, and other properties of the source language sentence to help predict each phrase’s translation. Our approach requires very little computation beyond the standard phrase extraction algorithm and scales well to large data scenarios. We report significant improvements in automatic evaluation scores for Chineseto-English and English-to-German translation, and also describe our entry in the WMT08 shared task based on this approach.
منابع مشابه
Utilizing Source Context in Statistical Machine Translation
Current methods for statistical machine translation typically utilize only a limited context in the input sentence. Many language phenomena thus remain out of their reach, for example long-distance agreement in morphologically rich languages or lexical selection often require information from the whole source sentence. In this work, we present an overview of approaches for including wider conte...
متن کاملAn Empirical Analysis of Source Context Features for Phrase-Based Statistical Machine Translation
Statistical phrase-based machine translation systems make only little use of context information: while the language model takes into account target side context, context information on the source side is typically not integrated into phrase-based translation systems. Translational features such as phrase translation probabilities are learned from phrase-translation pairs extracted from word-al...
متن کاملTarget-Side Context for Discriminative Models in Statistical Machine Translation
Discriminative translation models utilizing source context have been shown to help statistical machine translation performance. We propose a novel extension of this work using target context information. Surprisingly, we show that this model can be efficiently integrated directly in the decoding process. Our approach scales to large training data sizes and results in consistent improvements in ...
متن کاملStatistical Machine Translation of English – Manipuri using Morpho-syntactic and Semantic Information
English-Manipuri language pair is one of the rarely investigated with restricted bilingual resources. The development of a factored Statistical Machine Translation (SMT) system between English as source and Manipuri, a morphologically rich language as target is reported. The role of the suffixes and dependency relations on the source side and case markers on the target side are identified as im...
متن کاملEnriching machine-mediated speech-to-speech translation using contextual information
Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can potentially aid the end users in improved unde...
متن کامل